Text Classification Based on Deep Textual Parsing
نویسندگان
چکیده
The problem of classifying text based on the deep parsing structure is addressed. An algorithm for document classification tasks where counts of words or n-grams is insufficient is proposed. The parse tree kernel method at the level of paragraphs, based on anaphora, rhetoric structure relations and communicative actions linking phrases in the parse thicket is considered.
منابع مشابه
Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Dependency Parses to Augment Feature Construction for Text Mining
(ABSTRACT) With the prevalence of large data stored in the cloud, including unstructured information in the form of text, there is now an increased emphasis on text mining. A broad range of techniques are now used for text mining, including algorithms adapted from machine learning, NLP, computational linguistics, and data mining. Applications are also multi-fold, including classification, clust...
متن کاملText Mining : Approaches and Applications
The field of text mining seeks to extract useful information from unstructured textual data through the identification and exploration of interesting patterns. The techniques employed usually do not involve deep linguistic analysis or parsing, but rely on simple “bag-of-words” text representations based on vector space. Several approaches to the identification of patterns are discussed, includi...
متن کاملShallow, Deep and Hybrid Processing with UIMA and Heart of Gold
The Unstructured Information Management Architecture (UIMA) is a generic platform for processing text and other unstructured, human-generated data. For text, it has been proposed and is being used mainly for shallow natural language processing (NLP) tasks such as part-of-speech tagging, chunking, named entity recognition and shallow parsing. However, it is commonly accepted that getting interes...
متن کامل